Goto

Collaborating Authors

 surrogate loss







Calibration and Consistency of Adversarial Surrogate Losses

Neural Information Processing Systems

Adversarial robustness is an increasingly critical property of classifiers in applications. The design of robust algorithms relies on surrogate losses since the optimization of the adversarial loss with most hypothesis sets is NP-hard.


Fundamental Novel Consistency Theory: $H$-Consistency Bounds

Zhong, Yutao

arXiv.org Machine Learning

In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance due to computational intractability or lack of differentiability. We present an in-depth study of the target loss estimation error relative to the surrogate loss estimation error. Our analysis leads to $H$-consistency bounds, which are guarantees accounting for the hypothesis set $H$. These bounds offer stronger guarantees than Bayes-consistency or $H$-calibration and are more informative than excess error bounds. We begin with binary classification, establishing tight distribution-dependent and -independent bounds. We provide explicit bounds for convex surrogates (including linear models and neural networks) and analyze the adversarial setting for surrogates like $ρ$-margin and sigmoid loss. Extending to multi-class classification, we present the first $H$-consistency bounds for max, sum, and constrained losses, covering both non-adversarial and adversarial scenarios. We demonstrate that in some cases, non-trivial $H$-consistency bounds are unattainable. We also investigate comp-sum losses (e.g., cross-entropy, MAE), deriving their first $H$-consistency bounds and introducing smooth adversarial variants that yield robust learning algorithms. We develop a comprehensive framework for deriving these bounds across various surrogates, introducing new characterizations for constrained and comp-sum losses. Finally, we examine the growth rates of $H$-consistency bounds, establishing a universal square-root growth rate for smooth surrogates in binary and multi-class tasks, and analyze minimizability gaps to guide surrogate selection.


Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral

Mao, Anqi

arXiv.org Machine Learning

Large language models (LLMs) have achieved remarkable performance but face critical challenges: hallucinations and high inference costs. Leveraging multiple experts offers a solution: deferring uncertain inputs to more capable experts improves reliability, while routing simpler queries to smaller, distilled models enhances efficiency. This motivates the problem of learning with multiple-expert deferral. This thesis presents a comprehensive study of this problem and the related problem of learning with abstention, supported by strong consistency guarantees. First, for learning with abstention (a special case of deferral), we analyze score-based and predictor-rejector formulations in multi-class classification. We introduce new families of surrogate losses and prove strong non-asymptotic, hypothesis set-specific consistency guarantees, resolving two existing open questions. We analyze both single-stage and practical two-stage settings, with experiments on CIFAR-10, CIFAR-100, and SVHN demonstrating the superior performance of our algorithms. Second, we address general multi-expert deferral in classification. We design new surrogate losses for both single-stage and two-stage scenarios and prove they benefit from strong $H$-consistency bounds. For the two-stage scenario, we show that our surrogate losses are realizable $H$-consistent for constant cost functions, leading to effective new algorithms. Finally, we introduce a novel framework for regression with deferral to address continuous label spaces. Our versatile framework accommodates multiple experts and various cost structures, supporting both single-stage and two-stage methods. It subsumes recent work on regression with abstention. We propose new surrogate losses with proven $H$-consistency and demonstrate the empirical effectiveness of the resulting algorithms.



Trading off Consistency and Dimensionality of Convex Surrogates for Multiclass Classification

Neural Information Processing Systems

In this paradigm, outcomes must be embedded into the reals with dimension $d \approx n$ in order to design a consistent surrogate loss. Consistent losses are well-motivated theoretically, yet for large $n$, such as in information retrieval and structured prediction tasks, their optimization may be computationally infeasible. In practice, outcomes are typically embedded into some $\mathbb{R}^d$ for $d \ll n$, with little known about their suitability for multiclass classification. We investigate two approaches for trading off consistency and dimensionality in multiclass classification while using a convex surrogate loss. We first formalize partial consistency when the optimized surrogate has dimension $d \ll n$. We then check if partial consistency holds under a given embedding and low-noise assumption, providing insight into when to use a particular embedding into $\mathbb{R}^d$. Finally, we present a new method to construct (fully) consistent losses with $d \ll n$ out of multiple problem instances. Our practical approach leverages parallelism to sidestep lower bounds on $d$.